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ABSTRACT 

We propose an ambitious new method that models the intracluster medium in clusters of galaxies as 
a set of X-ray emitting smoothed particles of plasma. Each smoothed particle is described by a handful 
of parameters including temperature, location, size, and elemental abundances. Hundreds to thousands 
of these particles are used to construct a model cluster of galaxies, with the appropriate complexity 
estimated from the data quality. This model is then compared iteratively with X-ray data in the form 
of adaptively binned photon lists via a two-sample likelihood statistic and iterated via Markov Chain 
Monte Carlo. The complex cluster model is propagated through the X-ray instrument response using 
direct sampling Monte Carlo methods. Using this approach the method can reproduce many of the 
features observed in the X-ray emission in a less assumption-dependent way than traditional analyses, 
and it allows for a more detailed characterization of the density, temperature, and metal abundance 
structure of clusters. Multi-instrument X-ray analyses and simultaneous X-ray, Sunyaev-Zeldovich (SZ), 
and lensing analyses are a straight-forward extension of this methodology. Significant challenges still 
exist in understanding the degeneracy in these models and the statistical noise induced by the complexity 
of the models. 

Subject headings: methods: data analysis — X-rays: galaxies: clusters 



1. MOTIVATION 

There are a large class of data analysis problems in as- 
trophysics involving a model of a source that has a spec- 
trum that varies spatially on the sky. In the case of the 
X-ray emission from clusters of galaxies, where the individ- 
ual photon energies are recorded, the problem is especially 
prominent. In fact, much of the recent work in the X-ray 
analysis of clusters of galaxies has had the aim of obtain- 
ing a complete description of the density and temperature 
structure of the intracluster medium. 

The X-ray emission from clusters of galaxies has been 
recently shown, using observations with the Chandra and 
XMM-Newton telescopes, to have a very complex distri- 
bution. In general, the surface brightness lacks circular 
symmetry. For example, cold fronts, or cluster gas that 
"sloshing" relative to the dark matter gravitational po- 
tential, is a significant perturbation f rom a sp herically- 
symmetric gas distribution (Ma rkevitch et al.|[2Q QQ). "Ra- 
dio bubbles" , which are displacements of cluster gas by rel- 
ativistic radio-emitting plasma, have complex morpholo- 
gies and vary from cluster to cluster. Sanders et al. (2004) 
have identified a "swirl" in the temperature structure of 
the Perseus cluster possibly due to rotation of the cluster 
gas. There may also b e ripples or waves in the cluster 

fis (Fabian et al. 2003). Neither are clusters isothermal: 
aastra et al. (2003) and Buo te et all (|2003[ ) have ques- 
tioned the assumption that the cluster gas can even ap- 
proximately be described by a single phase as a function 
of radius. In fact, they find evidence that the tempera- 
ture can vary at a single radius by a larger factor than 
the average temperature drops radially across the entire 
cluster. 



Recently, the gross spectrum from clusters of galaxies 
was shown to be inconsistent wi th simple models of cooling 
flows (e.g. IPeterson et al.ll2QQl[ ). The confusion over what 
this result means clearly precludes the use of any simple 
theoretical template in the construction of a model of the 
intracluster medium. Our goal in this paper is therefore 
to provide a flexible method that is capable of reproduc- 
ing all of the complexities we have mentioned above. This 
method must necessarily utilize both the spectral and spa- 
tial information in a dataset and should still be effective 
with datasets that have relatively low numbers of X-ray 
counts. 

In the literature is a large number of diverse techniques 
for analyzing X-ray data from galaxy clusters. The meth- 
ods range from fltting isothermal spectra to square grids of 
photo ns across the detector plane (e.g. iMarkevitch et all 
I2000D to imaging deproj ection techniques through "onion - 
peeling" methods (e.g. iFabian, Cowie fc Grindlavlll98ll ). 
Fitting adaptively-binned data to isothermal models has 
also proved useful (Sanders et al. 2004) and the onion- 
peeling techn ique has been extended to spec t roscopic 
deproiection (|Arab adiis. Bautz fc Garmire 2002; "Arnau^ 
I2OOII : kaastra et a l. 2003; Andersson fc Madejski 2004). 
Other methods include image smoothing, gross spec- 
tral fltting, and inver sion of spectral hardness ratios 
([Sanderson et al. Il2004 ). 

We break with these previous methods in two important 
ways. First, we use the spectrum and image of a source on 
an equal footing. Traditional analyses extract a spectrum 
from a piece of the detector plane and then use the inte- 
grated number of photons as a proxy for the true source 
image. However, this process is only approximately cor- 
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rect in the presence of broad point-spread functions that 
are energy-dependent. Even the concept that there is an 
"image" and a "spectrum" for a spatially-resolved source 
is misleading, since in reality there is a spatially-varying 
emissivity at a given energy. We propose to use a Monte 
Carlo calculation to forward-fold a complex model to pre- 
dict detector positions and CCD energies for each photon. 
Secondly, we construct a model that is flexible enough to 
allow for extremely complex gas temperature and density 
distributions in clusters. For example, a common approach 
is to assume a single temperature for a given set of pho- 
tons and fit for its value. In contrast, our procedure is to 
allow for a multi-temperature distribution, and then allow 
the method to sharpen the temperature distribution if it is 
close to a single temperature. This approach is motivated 
by the observations outlined in the first paragraph that 
clusters are complex. We achieve this complexity by mod- 
eling the emissivity from clusters as using conglomerations 
of emissivity particles, smoothed for ease of interpretation. 

The construction of underlying model distributions from 
linear combinations of simple basis functions is well- 
founded in modern Bayesian im age processing : indeed, 
the "massive inference" method of lSkillina (|l998f ) has been 
found to be a powerful technique for the reconstruction of 
one dimensional spectra and two dimensional images. This 
work may be seen as a practical application of the basic 
premises of massive inference to the problem of astronomi- 
cal X-ray data analysis, with some of the refinements com- 
promised in the name of pragmatism. Skilling et al. coined 
the term "atom" for the indivisible unit of image flux; to 
avoid confusion with the atomic physics terminology of X- 
ray spectroscopy we translate atom into particle and draw 
an analogy with numerical simulations of galaxy clusters. 
Where in the first instance our goal is to model X-ray 
data, it will become clear that this "smoothed particle in- 
ference" (SPI) methodology offers great opportunities for 
further analysis and physical interpretation. This work 
also bor rows heavily from some independent pioneering 
work bv Jernigan fc Vezid (1996). There a Monte Carlo 
approach was used to invert X-ray data by interating a 
non-parametric model, while matching the simulation with 
the X-ray events. 

In this paper, we describe the SPI method in consider- 
able detail, highlighting the two novel aspects introduced 
above. We then apply the method on realistic simulated 
datasets for both EPIC and RGS, and show that it works. 
We then demonstrate some of the benefits of the method- 
ology with regard to extracting and displaying the infor- 
mation in the data. 

In two companion papers, we apply the method 
to XMM-Newton Reflection Grating Spectrometer 

S RGS) ob servations of the Perseus and Virgo clusters 
Peterson et al. 2006) and XMM-Newton European Pho- 
ton Imaging Cam era (EPIC) observation s of Abell 1689 
and RXJ0658-55 (^Andersson et al. II2006D . 

2. METHODOLOGY 

This section is organized as follows. We first describe 
our choice of model, a set of spatially Gaussian X-ray 
plasma emitters of an assigned spectral type. Then, we 
discuss X-ray Monte Carlo techniques to propagate this 
model through the instrument response. We discuss the 
ensuing two-sample likelihood statistic used to assess the 



statistical significance of the model. Finally, we describe 
the use of Markov Chain techniques to iterate the model 
and explore the parameter space. 

2.1. Smoothed Particles in a Luminosity Basis 

We have argued in the previous section that a "non- 
parametric" method, or rather, a method with thousands 
of parameters, is required to describe the current X-ray 
data. In particular, a more unified modeling approach, 
embracing all the complex aspects of clusters that have 
recently been discovered (cold fronts, temperature radial 
gradients, local non-isothermality, lack of circular symme- 
try in the temperature distribution) is now required. 

Consider a "blob", or "smoothed particle", of plasma, 
described by a spatial position, a Gaussian width, a single 
temperature, a set of elemental abundances, and an overall 
luminosity. A model of a cluster of galaxies may be con- 
structed from a set of such particles. There are a number 
of advantages to doing this. The model has enough free- 
dom to reproduce all of the salient features of cluster gas 
distributions, and can form complex density and tempera- 
ture structures. Particles with different temperatures can 
occupy the same position and therefore multi-phase tem- 
perature distributions can be constructed. There is also 
no global spherical symmetry that is imposed on the parti- 
cle configuration, although they are themselves spherically 
symmetric. Another powerful aspect of using smoothed 
particles is that they can be mapped from one space to 
another in a straightforward way: for example a collec- 
tion of smoothed particles representing X-ray luminosity 
may be manipulated in order to derive the corresponding 
SZ decrement map. This approach is somewhat similar 
to the methods used in smoothed particle hydrodynamics 
(SPH), except here the method decide the properties of 
the smoothed particle, rather than a set of hydrodynami- 
cal equations. For this reason, we refer to the method as 
"smoothed particle inference" . 

Assuming an X-ray source is optically-thin, we can write 
the differential luminosity per volume due to a set of 
smoothed Gaussian particles as 



dVdT 
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(1) 



where Li is the luminosity per particle in ergs per sec- 
ond, Ti is the temperature of the zth particle in keV, Da 
is the angular diameter distance in cm, cji is the angular 
size in radians, Xi is the location of the particle in three 
dimensions expressed in cm. We achieve the differential 
luminosity per solid angle by integrating along the line of 
sight, dz^ 
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This can be rewritten as 
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where xi is now the two dimensional spatial position of 
each smoothed particle. The differential flux per unit solid 
angle per unit energy, f^^^ , in units of photons per second 
per cm^ which is the actual observable, is then obtained by 
first calculating the differential flux per unit energy, 
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(4) 

where I^l is the luminosity distance in cm, A is the cool- 
ing function in units of ergscm^s"^, is the differential 
emission measure, and ^ is the line (or continuum power) 
which is predicted uniquely given a set of elemental abun- 
dances and a temperature in units of cm^s~^keV~^ . Note 
that the cooling function is the energy averaged integral 
of The differential flux per solid angle per energy is 
then obtained by combining equation 3 and equation 4, 
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where are angular coordinates on the sky, a is now 
in angular units, I^l is the luminosity distance in cm, 
A{Ti^{Ai}) is the temperature and metallicity-dependent 
cooling function, and ^ is the energy-dependent line (or 
continuum) power function, which describes the probabil- 
ity of a photon having a given energy £^ as a function of 
temperature. 

We have purposely chosen to use luminosity as the ba- 
sis for our smoothed particles, rather than, say, gas den- 
sity. If we had parameterized in terms of the gas density 
then predicting the luminosity would require, essentially, 
a summation over all particles to reconstruct the density 
field and then the squaring of the result. In this case, the 
complete luminosity field would have to be reconstructed 
following the alteration of a single particle. Later, we 
will demonstrate that using a luminosity basis will sim- 
plify the calculation considerably but not quantitatively 
change the results. We are also anticipating that in using 
this method in combination with SZ observations, it will 
still be preferable to use the luminosity as the basis rather 
than a Compton-y parameter basis since the calculation of 
the X-ray instrument response is likely to be much more 
involved. 

2.2. X-ray Monte Carlo Methods 

We convert the luminosity at each energy and spatial 
position into a quantitative prediction for probability of 
the numbers of photons at a given detector position and 
energy via a Monte Carlo calculation. More explicitly, our 
goal is the calculation of the probability of detection, 
given the instrument response function, and the photon 
source model, S. This is expressed as, 

D(u, p)=JdE dn Rin, p\E,n) {E,n) (6) 



where u are the detector position vector, p is the measured 
detector energy, Q is the sky position vector, and E is the 
intrinsic energy of the photon. This is a three-dimensional 
integral: it is most efficiently calcula ted using a Mont e 
Carlo approach. In iPeterson, Jernigan. fc KahnI (|2QQ4 ). 
we describe the calculation of this integral for the Reflec- 
tion Grating Spectrometers (RGS) on XMM-Newton. We 
have also written a companion Monte Carlo for the Euro- 
pean Photon Imaging C ameras (EPIC) on XMM-Newton 
([Andersson et al. ll2QQ6f ). We do not describe these calcu- 



lations in detail here, but instead briefly outline their use 
in the context of smoothed particle simulation. 

In the Monte Carlo calculation, photons are simulated 
sequentially via conditional probability functions. Mir- 
ror, grating, and detector responses are included in the 
calculation. For the RGS simulation, photons are ma- 
nipulated using a set of dispersion and cross-dispersion 
response functions, CCD pulse-height redistribution func- 
tions, and exposure maps. For the EPIC Monte Carlo, 
photons are constructed using the point-spread function, 
vignetting, CCD pulse-height redistribution functions, and 
the exposure map. In general, the functions are both 
energy and off-axis angle dependent and the calcula- 
tion of these functions involves a sequence of convolu- 
tions. The response functions are calculated on coarse 
yids, and then interpolated using a Monte Carlo method 
(|Peterson7jernigan, fc Kahn 2004). Photons are removed 
in proportion to the relative response probability, in or- 
der to properly calculate the effective area, which is also 
off- axis angle and energy dependent. 

For the purpose of simulating the X-ray emission from a 
set of SPI particles, a set of simulated photons is assigned 
to a particular particle. The particle parameters are the 
temperature (T), set of abundances, position ((/), ?/^), Gaus- 
sian width (cr), and possibly the redshift. For each photon, 
the wavelength is first chosen from the probability distri- 
bution described by the emissivity function corresponding 
to the temperature and set of elemental abundances of that 
particle. Its position is then perturbed from the nominal 
center of that particle (^, ip) by a random Gaussian step 
having width equal to a. 

The photon flux per particle, F^, is calculated by com- 
puting. 



N 



(7) 



where Aeff is the peak effective area in units of cm^, r 
is the exposure time in units of s, and N is the number 
of photons simulated before removing any photons in the 
Monte Carlo calculation. The luminosity, Li per particle 
can be calculated by computing. 
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and ^ is the emissivity per energy interval in units of 
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2.3. The Likelihood Function 

We quantify the goodness of fit of a given set of SPI par- 
ticles via the likelihood function of their parameters. In 
general, the likelihood is a function dependent on both pre- 
dicted and actual data, whose form embodies the analyst's 
understanding of the measurement errors, and is under- 
stood as the probability of getting the data given the pa- 
rameters of the model. Ordinarily, the predicted data are 
calculated to arbitrary precision within the context of the 
model under investigation: this is not the case here. The 
Monte Carlo simulation procedure outlined above, while 
needed to deal correctly with the energy-dependent and 
spatially varying telescope response, provides predicted 
data that are themselves uncertain, being the outcome of a 
Poisson process. We can view the output of the telescope, 
and the output of the simulation, as two independent data 
streams emanating from the same underlying physical pro- 
cess (emission from the cluster followed by detection at 
the telescope). The likelihood function then quantifies the 
misfit between the two datasets. 

Note that given infinite simulation time, predicted data 
can be computed with infinite precision within the con- 
text of the limitations of the instrument model; clearly any 
likelihood statistic used should revert to the more familiar 
form in this limit. Moreover, we note that all likelihood 
functions are necessarily approximations to an unknown 
function (that indeed, should itself form part of the data 
model) . We should be satisfied with a likelihood that cap- 
tures the essence of the misfit, and satisfies any limits such 
as that above. Statistics for the two Poisson sample sit- 
uation have not been well-developed in the statistical lit- 
erature; possibly it has been deemed preferable to find al- 
ternative ways of computing predicted data exactly rather 
than press on with effectively two sources of measurement 
error. Where attempts have been made, they have made 
use of Gaussian approximations valid only in the limit of 
large numbers. In the appendix we give a discussion of a 
number of routes to a likelihood function applicable to the 
low counting statistics of X-ray datasets, and show how 
they recover the correct form in the limit of infinite simu- 
lation time. We give below the form of the likelihood L2 
that we use in practice, and note that in its normalized 
form (I/2Ar) that — 21ogL2Ar is distributed as with the 
number of degrees of freedom equal to the number of bins, 
while not required by the analysis, we find that this pro- 
vides a valuable check on the pure goodness of fit of the 
models, giving confidence in the results. 

In general, we find that the results are relatively insen- 
sitive to the exact choice of likelihood statistic and the dif- 
ferences are largest when the data is extremely sparse and 
there are few photons: in subsection 12.3.21 below we out- 
line our approach for avoiding this eventuality. We exper- 
imented with the likelihood functions in the next section 
and found the reconstruction to produce similar results. 

2.3.1. Two Sample Likelihood Statistic 

The basic datum is a number of photons in a three 
dimensional bin {ui^U2^p). The bin size may be set by 
physical limits such as the extent of the CCD pixel, or 
be imposed given some understanding of, for example, the 
spectral resolution of the instrument. The simulated pho- 
tons are then binned on the same grid. Assume that we 



have n photons in the observed dataset, and have simu- 
lated m photons from our model: in the i^^ bin we have 
Ei simulated photons, and Oi observed photons. Since the 
photons falling into a set of bins will follow a multinomial 
distribution, we obtain for the data photons the likelihood 
function 



io = n! n 



Oi 



Oil 



(10) 



where is the probability of getting a photon in that 
bin. The multinomial distribution is the Poisson distribu- 
tion but with total number of photons fixed. Treating the 
simulated photons in the same way, we get 
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In doing so we have suggested that the simulated photons 
be considered to be from a separate, independent, experi- 
ment with its own likelihood. The joint likelihood is then 
just the product of the two experiments' likelihoods. In the 
appendix we discuss two routes to dealing with the value of 
the parameters A^: one can see that a reasonable guess is to 
estimate A^ from the data and set A^ = {Oi -\- Ei)/{m-\-n): 
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(12) 



We use this form when comparing exploring the parameter 
space as outlined in the next section. It is possible to re- 
normalize these likelihood functions, such that they will 
approximately follow the distribution ( Cowan 1998^ 
This is done by dividing by the same expression with the 
probability estimates replaced by the actual number of 
counts. The result for the likelihood given above is writ- 
ten as 



^2N 
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\ n / \ m / 



(13) 

By construction, — 21ogL2Ar is approximately distributed 
as with the number of degrees of freedom equal to the 
number of bins. We use L2N for checking that a reasonable 
fit is being obtained by the method (see also section [233]), 
which will be near the number of degrees of freedom for a 
good fit. In the appendix we demonstrate how this statis- 
tic approaches the two sample x^ statistic that we used 
in the large m and n limit. L2 is used for the parameter 
iteration, since it analgous to the one-sample likelihood. 

2.3.2. Three- Dimensional Adaptive Binning 

The statistics that we have discussed in the previous 
section require the events to be binned on a arbitrary 
three-dimensional grid. For a non-dispersive spectrome- 
ter, the natural minimal grid for this would be to design 
bins to reasonably sample the point-spread function for the 
two spatial dimensions and the energy resolution kernel 
for the spectral dimension. For a dispersive spectrometer, 
the angular dispersion resolution kernel would define the 
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bins in the dispersion direction. In either case, however, 
it is found that, for virtuahy ah observations of clusters of 
galaxies, the numbers of minimal bins are very large in- 
deed, and that the photons fill the three dimensional grids 
extremely sparsely. In fact, most bins typically contain 
either or 1 photons. Simple grouping of these minimal 
bins will result in severe loss of spectral and/or spatial 
resolution. Therefore, a more efficient binning approach 
deserves some consideration. Although the statistics dis- 
cussed in the previous section are capable of handling low 
numbers of photon counts, we are pushed toward fuller 
bins to reduce computation time. Additionally, we show 
below that the fiuctuation induced on the statistic by the 
Monte Carlo calculation depends on the number of bins, 
such that a smaller number of bins is preferred. 

For these reasons, we have developed an algorithm to 
adaptively bin the data in the three dimensions similar to 
other adaptive binning methods. To do this, we first con- 
sider the entire three-dimensional data space. We create 
the binning grid by bisecting the data space in one dimen- 
sion at a time. The dimension to bisect is chosen from the 
relative length of the bin in each dimension. The longer 
dimension is always bisected. If the relative sizes are the 
same the dimension is selected randomly. Each bin is bi- 
sected for as long as any bin has more than N counts; most 
of the bins will contain on average N/2 photon counts by 
the time the binning has been completed. Typically, N 
is chosen to be 10 or 20, but this varies between applica- 
tions. For instance in a dataset with extremely low fiux 
the bins can be made with fewer photons. There is also 
the possibility of weighting the bin size in any dimension 
with some factor n. For example, if spectral accuracy is 
preferred over spatial accuracy the bins in the spectral di- 
mension can be chosen to be on average n times smaller 
than the spatial bins. After the bins have been determined 
for the data photons, they remain fixed. We then use a 
binary tree algorithm to rapidly place new simulated pho- 
tons into the appropriate bin. The two-sample statistic 
given in the previous section can then be calculated as a 
product over all the bins. 

2.4. Parameter Iteration 

After model generation and likelihood calculation, the 
third major component of the method is the sequence of 
model parameter iteration. We have found several compli- 
cations to the usual methods of parameter iteration that 
are particularly problematic with the high-dimensional pa- 
rameter model that we are considering in this paper. In 
the following subsections we discuss several aspects of this 
part of the analysis chain. 

2.4.1. Markov Chain Monte Carlo Exploration 

We explore the parameter space of the smoothed 
particle model with the Markov Chain Monte Carlo 
(MCMC) method, the use of which has recently become 
widespread in the field of Bayesian d ata analysis (see 
iGilks, Richards on fc S piegelhalterl [20031 . for an excellent 
introduction). MCMC is a technique for sampling the 
probability distribution of a model's parameters; in this 
case, the dimensionality of the parameter space is ex- 
tremely high, prohibiting brute force grid calculations and 
condemning gradient-based optimization to failure by be- 
ing trapped in local likelihood maxima. MCMC is a 



very efficient and powerful way of exploring such high- 
dimensional spaces, returning a list of sample cluster mod- 
els all of whom are consistent with the data, and whose 
number density in parameter space follows the probability 
density. 

Possession of the entire parameter probability distri- 
bution, in the form of a list of samples, opens up pos- 
sibilities not available to other methods. The distribu- 
tions of particle parameters, for example, can be used 
to estimate the average distributions of cluster quanti- 
ties (such as temperatures and densities). These aver- 
ages come with well-defined uncertainties trivially derived 
from the samples: one may probe the range of distribu- 
tions that are consistent with the data. MCMC is now 
the standard tool for cosmological data an alysis (see e.g. 
iLewis fc Bridie 120021 : iTegmark et al.l l2004D . MCMC has 
been applied to simple clu ster modeling using SZ and lens- 
i ng d ata (Ma rshall et al.l 120031) . and X-ray and SZ data 
(|Bonamente et~alll2004 ). To date, the use of MCMC in 
astrophysics has been restricted to relatively modest num- 
bers of parameters. However, it is when using large num- 
bers of parameters to model complex datasets that the full 
power of the technique is realised. 

As usual, the target probability distribution is the pos- 
terior density P, given by Bayes' theorem as 

Pm\D)<xL{D\{e})nm). (14) 

Here, L is the likelihood function introduced above, with 
the notation emphasizing its dependence on the observed 
data D given the parameter set {0}. tt is the prior prob- 
ability distribution of the model parameters: in this work 
we use uninformative priors on all parameters, restricting 
ourselves to uniform distributions over broad finite ranges. 

We give here the briefest o f introductions t o the 
Metropolis-Ha stings algorithm ([Metropolis et al] Il953l : 
lHastingslll970f ) that we use to sample the posterior den- 
sity, and then provide details of the implementation in the 
following subsections. Following the evaluation of the like- 
lihood associated with an initial point in parameter space. 
Pi = (^1, ^2, • • • , ^n), a new point, P2 is drawn from a pro- 
posal distribution. This proposal distribution is required 
to be symmetric about the current position to ensure suc- 
cessful sampling of the target distribution. If the likelihood 
of the new point is higher than the likelihood of Pi then 
the new point is accepted. Otherwise, the point is accepted 
only if a uniform random number between and 1 is less 
than the ratio of likelihoods, i}^pj^ • This is known as the 
Metropolis-Hastings acceptance criterion. If the new point 
is not accepted, then the old point is repeated in the chain 
of parameters. While this basic algorithm is very simple, 
the efficiency of the exploration of the parameter space 
depends strongly on the form of the proposal distribution, 
which we discuss in the following two subsections. 

2.4.2. Adaptive Proposal Distribution 

As we mentioned in the previous section, the method of 
picking the next point in the Markov chain can either be 
by taking a step from some fixed distribution or it can be 
modified dynamically. In dealing with the large number 
of parameters in the smoothed particle fits, we have found 
that an adaptive proposal distribution is necessary. Ini- 
tially, the proposal distribution is chosen to be a Gaussian 
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of fixed width, cr. After the Markov chain has taken a few 
steps, we calculate the standard deviation, a', of the new 
estimate of the target posterior distribution. We then use 
a' as an estimate for the new step in the proposal distri- 
bution. Large excursions in parameter space, however, 
can sometimes bias this standard deviation estimation. 
Therefore, we have constructed a different standard de- 
viation estimator with some built-in "memory loss" , such 
that earlier steps are weighted less strongly. The proposal 
step width, a" is calculated according to 

1^ E.(i + E)' (E,(i + f)')' I 

where pi is the parameter value at the i^^ iteration and the 
sum is over all the iterations, e is a constant taken to be 
0.01 that causes the earlier iterations to be de-weighted. 
If e were zero, the formula would result to the normal 
standard deviation formula; an infinite value for epsilon 
implies the normal non-adaptive Gaussian proposal distri- 
bution. With finite e, the individual steps no longer ex- 
actly satisfy detailed balance, as the proposal distribution 
is slightly different between the current and trial sample 
positions. If epsilon is set sufficiently small (although not 
so small as to restrict the movement of the chain) then a 
large number of past sample positions are used to deter- 
mine the proposal distribution widths, and consequently 
the differences between any successive proposal distribu- 
tions are very small. The parameter update method then 
asymptotically becomes a true Markov chain. In prac- 
tice these differences are completely negligible compared 
to the large deviations from unity of the likelihood ra- 
tios. Quantitatively, we found that this holds whenever 
2 A log L ^ ~ e. In practice 2 A log L is much larger 
than one and we take e to be 0.01. 

The proposal step is then taken to be 2.38cr''(i~ 2 ^ where 
d is the number of dimensions. This is the optimal step 
size for uncorrelated Gaussian distributions, such that the 
acceptance rate times the RMS step size is maximized for 
this value. In the large d limit, a rejection rate around 77% 
is expected (Ro berts, Gelman, fc Gilks 1997) . A success- 
ful chain will, after the burn-in period, explore the poste- 
rior distribution with this rejection rate, and the proposal 
distribution in any given parameter direction will asymp- 
tote to the Gaussian approximation to the marginalized 
posterior. 

2.4.3. Global and local parameters iteration sequence 

In a typical SPI emission model there are two kinds of 
parameters. The first kind are local parameters, describ- 
ing the individual properties of the particles. These are the 
temperature, position, size, and elemental abundances for 
each particle. The second kind are global parameters for 
describing properties of the entire data set. These param- 
eters are the background parameters and the absorption 
column density. The order in which these parameters are 
varied was found to have a strong influence on the speed of 
the chain convergence: the variation sequence has been un- 
der considerable study in the development of this method. 

Firstly, it was found that the local parameters for a par- 
ticular particle should be varied separately while holding 
the parameters of the other particles fixed and the global 



parameters fixed. We believe that this is a valid technique 
since any particle can be interchanged with any other par- 
ticle and therefore we can exploit this symmetry by not 
considering the full degeneracy of all of the particles pa- 
rameters. We do, however, vary the set of particle param- 
eters - position, temperature, elemental abundances, and 
Gaussian size- simultaneously since these parameters may 
be highly correlated. Varying each set of particle parame- 
ters separately speeds up the calculation by several orders 
of magnitude, as only the photons associated with that 
particle need to be resimulated. 

Secondly, we found that after changing the global pa- 
rameters the particle parameters needed to be varied for at 
least one iteration in order for the system to re-equilibrate 
and remain in the desired high probability region. This is 
necessary since there may be some correlations between 
the positions of the particles, for example, and global 
background parameters. Therefore, the particle param- 
eters must be allowed to move somewhat before rejecting 
a new set of global parameters. Without this considera- 
tion, we found some global parameters became pinned to 
sub-optimal values. 

Thirdly, we found it necessary to regenerate a com- 
pletely new set of photons for a particular set of parame- 
ters frequently. This is necessary due to the Monte Carlo 
nature of the calculation: there is statistical noise in the 
likelihood value itself. Although we believe the two sam- 
ple likelihood properly deals with this statistical noise in 
modifying the likelihood surface, it does not deal with the 
parameters drifting to a particular non-optimal value only 
because it received a favorable set of photons. Frequent 
regeneration of the photons simply cancels this effect and 
restores the mobility of the Markov chain. 

These three considerations lead us to the following se- 
quence of parameter iteration. Consider that we have n 
particles with local parameters, , for the ith particle and 
the jth particle parameter. Assume also that we have a 
set of global parameters gk for the kih. parameter. The 
following sequence of parameter variation is then used for 
each iteration: 

1. Vary values of to new values l'-'^ holding the pa- 
rameters of all other particles fixed, and compute 
the likelihood. Take the new values if the statistic 
conforms to the acceptance criterion. Repeat this 
step for each of the particles. 

2. Regenerate all photons for gu^l'j and compute the 
likelihood. 

3. Vary the values of gk to a new point g'j^. Hold the 
values of fixed at the previous values. 

4. Vary the values of to new values I'p ^ holding the 
parameters of all other particles fixed., and compute 
the likelihood. Take the new values, if the statistic 
conforms to the acceptance criterion. Repeat this 
step for each of the particles. 

5. Regenerate all of the photons g'j^ and possibly the 
new values of Compare the value of the likeli- 
hood at this point with that obtained in the second 
step. Take the new value of g'^ and values of if 
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it conforms to the acceptance criterion. Otherwise, 
return to the previous values of both gk and I'j at 
the second step. 

The absence of a Metropolis comparison in step 3 in- 
dicates a departure from the standard Markov chain: in 
fact, this step can be thought of as the splitting off of a 
new chain which explores the posterior conditional on the 
value of After some exploration of the local param- 
eter space, this chain is then compared with its parent 
(left behind at the known sample position gk) and the 
fittest (in the Metroplis-Hasting sense) survives in step 4. 
Strictly speaking this process does not conserve detailed 
balance: however, we have checked that the departure of 
this algorithm from the proper format does not affect the 
convergence on the target posterior distribution, at least 
in simple cases as we demonstrate in Appendix 2. 

2.4.4. Increase of the Over-simulate factor and 
Statistical Noise 

Since we have used a Monte Carlo calculation to 
calculate the likelihood statistic for the highly-complex 
smoothed-particle model, there is unavoidable statistical 
noise in the likelihood function. The same set of parame- 
ters will produce a slightly different value of the likelihood 
statistic every time they are simulated. This Monte Carlo 
noise can be reduced by increasing the "over-simulate fac- 
tor," the ratio of the simulated photons compared to the 
number of photons in the data set. In practice, a value of 
the over-simulate factor of 10 or so can be easily accommo- 
dated, but much larger values become tedious when work- 
ing on a single workstation. The statistical fluctuation in 
the two sample normalized statistic scales empirically as 



number. The evidence is the integral of the likelihood 
function L times the prior distribution, f{pi) over all 
parameters(pi): 



^L2 



number of bins 
over-simulate factor 



(16) 



such that the fluctuation can be reduced by both increas- 
ing the simulation factor and reducing the number of bins. 
Normally, this ratio is significantly larger than one, and 
therefore this is likely to significantly broaden the poste- 
rior distribution of any parameters. However, this merely 
makes the method conservative: the posterior distribu- 
tions we derive contain the smaller true distribution. This 
is illustrated in Figure [6l 

2.4.5. Particle Number and the Evidence 

Initially, we experimented with methods that may 
change the number of particles dynamically, during the 
evolution of the Markov chain. However, we found that 
any advantage that this method may have had, in reduc- 
ing the number of smoothed particles and their associated 
parameters, was outweighed by the complications added 
to the fitting procedure. In particular, it was difficult to 
determine what to do with the parameters of the parti- 
cles that were newly created or recently destroyed when 
the particle number changed. The use of adaptive steps 
within this scheme was also problematic. Therefore, we fix 
the number of particles and do not vary this during the pa- 
rameter iteration, instead opting for a more approximate 
approach to determining a suitable number of particles. 

We calculate (albeit approximately) the "evidence" of 
the smoothed particle model as a function of particle 



E = J l[dp,Lf{p, 



(17) 



The evidence works by penalizing overly complex mod- 
els (such as those with too many particles) wwith a lower 
evidence (see e.g. lMcKavl[2QQllO'Ruanaidh fc Fitzgerald! 



1 19961 [Marshall et al.l l2QQ3'. for introductions to the use of 
the evidence). We estimate the evidence by the method 
of th ermodynamic integration (|0'Ruanaidh fc Fitzgerald! 
"1996), raising the likelihood to some power during the ini- 
tial stages of the MCMC process and increasing this power 
steadily from to 1. In our studies, the evidence increased 
dramatically with increasing particle number for less than 
~ 100 smoothed particles but was then relatively fiat for 
larger particle numbers. This number of course varies from 
dataset to dataset: we use the smallest number of particles 
possible (indicated by the turnover in evidence, where the 
improvement in goodness-of-fit starts to become insignif- 
icant) in the interests of both the economy of hypotheses 
and also of saving computation time. 

The method requires one to choose both the over- 
simulate factor and the number of smoothed particles. The 
former is chosen to be as large as possible without the 
method taking too long, and the latter is chosen to be as 
large as possible given the criteria of the previous para- 
graph. Having finite values for the over-simulate factor 
limits the precision of the method, but this was found to 
be unimportant for many applications. Limiting the num- 
ber of particles via consideration of the evidence prevents 
fitting of the noise with an overly complex model. 

3. MODEL INTERPRETATION METHODS 

There is an impressive variety of techniques that can 
be applied to the results of the smoothed particle infer- 
ence. The output of the MCMC process consists of a set 
of parameters, both global and local, for the given num- 
ber of smoothed particles, for each iteration. Thus, the 
usual marginalized posterior probability distributions can 
be generated by histogramming the history of parameter 
values (see Fig. |4]). The interpretation of the smoothed 
particle parameters is not always straightforward, but a 
wealth of information resides in the MCMC samples. SPI 
is essentially a method for translating noisy X-ray data 
into uncertain emission components: the Markov chain 
process retains the correlations between components such 
that the manipulation of the samples can reveal much 
about the gross and detailed properties of the emitter. 
We discuss just some of the possible inference techniques 
below. As always, caution has to be used in only using 
samples taken after the chain has reached the stationary 
distribution. 

To demonstrate these inference techniques we have 
simulated a cluster model with two spatially separated 
isothermal components to which we apply our method. 
This choice of simplistic simulation reflects both our de- 
sire to demonstrate the algorithm on a relatively easily- 
interpreted problem, but also the difficulty of generating 
realistically complex mock datasets. To make a simple 
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demonstration of the features of the method we have cho- 
sen to simulate a double cluster with two isothermal com- 
ponents. This will test the ability of the method to esti- 
mate single temperatures as well as to separate different 
components in a dataset, both spatially and spectrally. 
The metho d is applied to r eal da taset s in two compan- 
ion pa pers, iPeterson et al.l (|2QQ6l ) and lAndersson et al. I 

In this work we perform separate inferences for the 
XMM-Newton EPIC and XMM-Newton RGS detectors. 
In the EPIC run we expect to be able to resolve the spatial 
structure of the clusters in detail due to the high spatial 
resolution (PSF FWHM ~ &') of the instrument datasets. 
For the RGS run the spatial resolution will be low in the 
dispersion direction of the spectrometer; however we ex- 
pect to resolve spectral features in finer detail. 

3.1. A simulated double- cluster model 

The spatial model of the simulated cluster consists of 
two beta-models, 

S{r)=Soil^^^J j (18) 

where S{r) is the source surface brightness at an- 
gular radius r and Tc is the angular core ra- 
dius. The beta models are separated diagonally by 
l^ Each of the two clusters is modeled spectrally 
with a MEKAL (iMewe. Gronenschild. fc van den Oord 
19851: iMewe. LemenTky^ d en Oord 1986; Kaastra 1992; 
Liedahh Osterheld, fc Goldsteinll 19951) model for the ther- 
mal plasma with Wisconsin ([Morrison fc McCammonI 
Il983l ) photo-electric absorption for the Galactic column. 
The cluster plasma temperatures were chosen separately 
for the EPIC and RGS simulations due to the different 
energy sensitivity of the two instruments. In the RGS run 
the two beta models have isothermal temperatures of kT 
= 1 keV and kT = 3 keV respectively whereas in the 
EPIC run these temperatures are at kT = 3 keV and kT 
= 6 keV . 

Metal abundances are set at 0.5 Solar, the redshift is 
fixed at z = 0.2 and both clusters are absorbed by a cor- 
responding Hydrogen column of 4 x 10^^ cm"^. The beta 
model parameters for the clusters were set to P = 0.7, 
Tc = 40'' and /3 = 0.5, rc = 15'' for the lower left and the 
upper right cluster respectively (see Fig. [7j). 

The numbers of simulated photons for the cluster model 
were 155000 for the EPIC simulation and 55000 for the 
RGS simulation, corresponding to those of a well exposed, 
bright cluster at z = 0.2. 

In both runs the number of plasma particles was set to 
700, a number chosen by a simple evidence argument (sec- 
tion I2.4.5p . The evidence was calculated using the first 
200 iterations running the method on the simulated EPIC 
data. Evidence calculations were made 5 times each for 
10, 50, 100, 500, 700 and 1000 particles (Figured]). We 
find that the evidence reaches a plateau or at least starts 
to level off at about 700 particles. 

Each plasma particle was assigned the same allowed 
range of spectral and spatial parameters, over which the 
prior pdf was uniform. 



The absorption was allowed to vary in the range; to 
1 X 10^^ cm"^. The metal abundances with respect to So- 
lar was varied locally between 0.0 and 1.0 Solar and the 
redshift of the plasma was fixed at z = 0.2. The particle 
temperature was assigned a uniform prior over the range 
kT = 0.5 keV to kT = 9.5 keV in the EPIC run and over 
the range kT = 0.5 keV to kT = 4.5 keV in the RGS 
run. The particles are modelled spatially using Gaussians 
with width whose logarithm was allowed to vary over the 
range 0.0 to 5.0 arcseconds. Particle positions were con- 
strained to a square area of 600" x 600" centered on the 
double-cluster centroid. 

The absorption was set to be a global parameter: the 
X-ray absorption was assumed to be Galactic only and 
approximately the same over the extent of the cluster. 
Plasma temperature is a local parameter, meaning that 
there can be as many temperature phases at a given iter- 
ation as there are particles. 

Each iteration chain was run for 1000 iterations. This 
number of iterations is assumed to be sufficient to sample 
the posterior distribution to an acceptable level of accu- 
racy. This choice is made to limit computing time to a 
practical level. 

The evolution of the Poisson per degrees of freedom 
minus one over the first 1000 iterations is shown in Figure [2] 
(left) along with the rejection rate (right). The rejection 
rate is shown for the global parameters (solid line) and lo- 
cal parameters for a sample of representative 10 particles 
(dashed lines). 

The renormalized log-likelihood can be seen to reduce to 
approximately 1 well before 200 iterations. Likewise, the 
rejection rate for both local and global parameters reaches 
(and then slightly exceeds) the canonical target value of 
0.77 at approximately this iteration number. At this point, 
and on average, any given parameter has its value changed 
only every fifth iteration. 

To ensure conversan ce of the chain we utilize the test 
suggested by Gelman in lGilks, Richardson fc SpiegelhalteJ 
(2003). This test involves running a number of chains with 
identical setups but different starting points. We show as 
an example 5 chains with the RGS version of the setup 
above for the column density parameter. Convergence is 
reached when a parameter 0^ ^ for chain i = 1 , . . . , m and it- 
eration j = 1, n has a variance between chains B similar 
to the variance within chains W: 

m 

B=^Y.^<P,-^f (19) 
m — 1 ^-^ ^ ^ 

^ m 

T^=-^.f (20) 

where 4>i = ^J2]=i^ij^ 4> = ^Y7=i4>i. = 
7^ Zlj^i (^ij ~ ^i)' these variances, an overestimate 
of the variance, ySt {(j)) is formed 

77 — 1 1 

w((/)) = W^-B. (21) 

n n 

The underestimate of the variance W and vSr {(j)) will 
approach the true variance from opposite directions as 
n ^ oo. Convergence is established when the 'estimated 
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Fig. 1. — The logarithm of the evidence (section |2.4.5|) calculated using the first 200 iterations for the simulated EPIC model data. The 
calculation was done 5 each times for 10, 50, 100, 300, 500, 700 and 1000 particles. The average and standard deviation of the evidence for 
the 5 runs is shown as stars with error bars. 




Fig. 2. — The Poisson (see section [2?3]) per degrees of freedom minus one (left) and rejection rate (right) for global (solid line) and a 
subset of local (dashed lines) parameters as a function of iteration number. 



potential scale reduction' R: 

V-R=^ (22) 

is below 1.1. ^ 

In calculating the above variances and R at iteration i 
we discard the first i/2 iterations. In our saniple of chains 
the result for uh is shown in Figure [3] with ^ shown with 
Y^var {(j)) errors (top) and R (bottom). A value of R below 
1.1 was reached at iteration ~ 400. 

Since the Gelman test uses only iterations from z/2 to 
i and convergence is reached at iteration i = 400 the test 
shows that chains are more or less identical over the range 
200 to 400. Therefore the chain was assumed to have 
reached a stationary distribution after 200 iterations, and 
consequently only the last 800 iterations were used to infer 
the properties of the model. 

3.2. Marginalized Distributions 



The marginalized posterior distributions for a parame- 
ter can be estimated simply by constructing the histogram 
of the sample parameter values. One can use either all 
smoothed particles, or some suitable subset allowing the 
cluster to be dissected in many different ways. The par- 
ticles can for instance be selected for a certain range of 
sky positions in order to view the distribution of temper- 
atures in different parts of the cluster. The width of the 
one-dimensional histogram gives an estimate of the uncer- 
tainty on that one parameter, with the caveat that the 
fluctuation in our Monte Carlo is likely to broaden the 
distribution by approximately the factor, ctlstv, as in the 
previous section. 

For global parameters, like the X-ray absorption in this 
case, these distributions are straight-forward to interpret 
(see e.g. Figure HI left panel). However, for local parame- 
ters such as the plasma temperature these histograms will 
contain the distribution of that parameter over all parti- 
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Fig. 3. — Evolution of the nn parameter in the Gelman convergence test. The figure shows ^ with ^/var {(p) errors (top). The estimated 
potential scale reduction R is shown below. 
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Fig. 4. — Parameter distributions for X-ray absorption (left) and ICM temperature (right). The absorption is a global parameter and equal 
for all particles while the temperature is a local parameter with different values for all particles. The top panel shows the results of the EPIC 
run and the lower panel contains the RGS run. For the plots of ICM temperature the blobs were selected in 20'' x 20'' boxes centered on the 
cold (solid line) and hot (dashed line) clusters. 



cles in addition to the local uncertainty and Monte-Carlo noise. 

In Figure [H (left panel) we show the distribution of the 
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Fig. 5. — Comparison of the fluxed spectra of the data (dashed) and model (sohd) using models from iteration 200 to 1000. The flux 
(photons cm-2 s-^ keV-^)) for the EPIC run is shown on the left and for the RGS run (photons cm ^ s -"^ A ■'^) on the right. 
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Fig. 6. — Shown is the resolution of the 1 keV temperature component from the RGS run for different values of the over-simulate factor. 
The lines respresent over-simulate 1 (solid-thin), 3 (dashed-thin), 10 (dot-dashed), 30 (solid-thick) and 100 (solid-dashed). The necessity of a 
high over-simulation (~ 10) for these data is clear. 



X-ray absorption in units of 10^^ cm~^ for the last 800 it- 
erations of the EPIC (top) and RGS (bottom) runs. Both 
pdf's comfortably accommodate the true value of 0.04. 

The right panel of Figure [H shows the distribution of 
plasma temperatures for all particles over the last 800 it- 
erations for the EPIC (top) and RGS (bottom) runs. In 
order to view the separate distributions for the simulated 
clusters the plasma blobs were selected in 20'' x 20'' square 
regions centered on each cluster. The figure shows the dis- 
tributions of the cold (lower left) and hot (upper right) 
components as solid and dashed lines respectively. The 
distribution from the EPIC run shows the difficulty of de- 
termining plasma temperatures from X-ray spectra. There 
is a high level of degeneracy between neighboring temper- 
atures. We have allowed for 700 different phases at any 
given iteration, making it possible (and indeed very prob- 
able) that the data from a single phase plasma will be 
satisfactorily fit (and so returned by the sampler) with a 



wide range of phases. The peaks at kT = 3 keV and 
kT = 6 keV are not individually distinguishable but are 
both broad distributions with significant probability mass 
extending over the entire allowed interval. The prior has 
been updated by the data though: the temperature dis- 
tributions are not uniform, and do actually contain useful 
information on the cluster temperature structure. In the 
next section we derive a spatially- varying measure of tem- 
perature from the median of this distribution at all spatial 
positions, and show how this process recovers the input 
temperatures. 

In the RGS run the kT = 1 keV peak is reproduced 
more precisely whereas the kT = 3 keV component is 
represented as a combination of particles at approximately 
kT = 2-4.5 keV . The RGS detectors lack sensitivity above 
energies of 2.25 keV, making them makes them unrespon- 
sive to the hot cluster plasma. At low energies though, the 
high spectral resolution of RGS makes it an excellent in- 
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strument for measuring cluster temperature distributions. 

3.3. Parameter Sky Maps 

Spatial maps of quantities can be constructed by a num- 
ber of methods. Working with particles in a luminosity ba- 
sis, we can reconstruct the surface brightness of the cluster 
by computing 



27raf 



(23) 



where Li is the luminosity normalization for the ith par- 
ticle, ai is the spatial width for the ith particle, and pi is 
the projected spatial position. Averaging the maps over 
the samples then provides an image representing the par- 
ticles' posterior PDF. Maps generated in this way are, to 
within the precision given by the noise and the accuracy 
dictated by the SPI model, corrected for detector expo- 
sure, vignetting, and PSF convolution, all of which distort 
the raw counts image. Furthermore, the particles are best- 
constrained where the signal-to- noise is highest: the SPI 
average intensity maps are multi-resolution image recon- 
structions. 

Likewise, luminosity-weighted parameter maps (e.g. 
temperature or abundance maps) can be calculated by 
computing. 



T{x,y) 



9^^ 



L{x,y) 



(24) 



where is the value of the parameter (e.g temperature or 
abundance) for the ith particle, Li is the luminosity nor- 
malization, ai is the Gaussian width, and pi is the spatial 
position. 

Note that the individual sample maps, while consistent 
with both the data and the noisy simulation, may not pro- 
vide a reconstruction of the cluster satisfactory to the eye 
of the analyst. For this, some sort of averaging is required, 
preferably over a large set of samples (just as when deal- 
ing with any uncertain measurements). Such maps are 
discussed in the following section. 

3.4. Median and Percentile Maps 

The median map can be computed by taking the me- 
dian of the Lj{x^y) or Tjix^y) over all iterations j in the 
stationary part of the Markov chain. Median luminosity 
maps are shown in Figures [3 and [8] for the EPIC and RGS 
runs respectively. As can be seen in the maps from the 
EPIC run the particles start out as a completely random 
distribution but after only 100 iterations the median map 
exhibits clear double cluster structure. The median lumi- 
nosity map of the last 800 samples in the stationary part 
of the Markov chain is indistinguishible from the input 
model. In the RGS run the clusters can be seen to be re- 
solved approximately in the cross-dispersion direction after 
1000 iterations. 

Likewise, median maps of luminosity- weighted temper- 
ature are shown in Figures [9] and [10] for XMM-Newton 
EPIC and RGS runs. It is clear from the sequence of plots 



in figure [9] that it is more difficult to estimate spatially re- 
solved cluster plasma temperatures than a luminosity dis- 
tribution. However, in the median map at iteration 1000 
the two isothermal components are clearly separated; even 
though the map is based on the medians of very broad dis- 
tributions (section [3r2|) we recover approximately the input 
temperatures of the two components. Clearly, as seen in 
the maps based on the XMM-Newton RGS simulated data 
(Figure [TQ|) creating spatially-resolved maps using a spec- 
trometer alone is more difficult; however, there is spatial 
information there which is extracted with the SPI method. 

Similarly, percentile maps can also be constructed in- 
stead of just the median. The difference between say the 
84th percentile and 16th percentile map gives the 1 cr er- 
ror map of a quantity in each point {x^y) with the caveat 
that the neighbouring pixels are highly correlated due to 
the underlying smooth particle structure. This means that 
the error map is an overestimate of the uncertainty on the 
pixel values. For a temperature map, for example, this 
procedure gives the 1 a error of the luminosity weighted 
temperature in each ix^y) bin. Such a map, created from 
the 800 final samples in the EPIC run, is shown in Fig- 
ure [11]). The map displays a 1 a uncertainty of kT ~0.5 
keV for the bottom left (cold) component and kT ~1 
keV for the top right (hot) component. 

This, again, demonstrates the subtle difficulties associ- 
ated with estimating high plasma temperatures given the 
currently available instruments. That the error on the me- 
dian appears to decrease to a small value at spatial points 
far away from the cluster center is simply due to the fact 
that for those distances the simulated data contain a very 
small number of photons: any estimated temperature dis- 
tribution will default to the applied uniform prior. The 
median of this uniform distribution will of course be close 
to the central value of the range and not vary much in the 
evolution of the Markov chain. 

Of course, these percentile maps give only an estimate 
of the uncertainty of the median value. If we want to mea- 
sure the width of the intrinsic range of temperatures at 
any given sky coordinate, should such a distribution exist, 
we have to do something more sophisticated. One meth- 
ods to visualize parameter ranges is demonstrated in the 
next section. 

3.5. Filtered Maps 

Another useful way of examining the posterior pdf of a 
given spatially- varying quantity is to segregate the parti- 
cles into different categories before constructing the lumi- 
nosity maps. A luminosity map can be constructed from 
particles with temperatures between say 1 x 10^ K and 
2 X 10^ K (kT = 1-2 keV ). This can then be compared 
directly with other luminosity maps of other temperature 
ranges to study the varying spatial distribution of the dif- 
ferent plasma temperatures. To demonstrate this two fil- 
tered luminosity maps were created, again using the final 
800 samples of the EPIC chain (Figure [T2]). These maps 
were filtered for temperature ranges kT = 0-4.5 keV (left) 
and kT = 4.5-9 keV (right) respectively. This type of lu- 
minosity map, filtered on plasma temperatures, would be 
impossible to produce using any standard spectral analy- 
sis: it requires some kind of forward-fitted spectral-spatial 
multicomponent model. 
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Fig. 7. — 500^^ x 500^^ sky-maps of ICM bolometric luminosity (arbitrary units) after 1, 10, 100 and 1000 iterations (top left to bottom 
right) centered on the double-cluster centroid. The luminosity is calculated according to Eq. [23] and the median over all iterations is taken 
in each ^" x b" pixel. The maps are based on the data from the EPIC run. 



Three-color RGB maps can be computed by using fil- 
tered luminosity maps with a different temperature range 
for each color. In Figure [13] such an RGB map is displayed 
using luminosity maps filtered in three ranges of temper- 
ature of equal size. In principle, this map contains more 
information since it describes the width of the temperature 
distribution as well as the mean temperature. 

4. CONCLUSIONS AND FUTURE WORK 

To summarize and conclude: 

1. Motivated by the complexity of modern X-ray data 
on clusters of galaxies, and the spatially- and 
spectrally- varying nature of both the emitting sys- 
tem and the instrument response, we have de- 
veloped a highly flexible forward-folding analysis 
scheme, dubbed smoothed particle inference. 



2. Deconvolution in the presence of high Poisson noise 
is achieved by Monte Carlo simulation of mock data 
and comparison of this with adaptively-binned raw 
counts with a two-sample likelihood function. 

3. Clusters are modelled with a linear combination 
of Gaussian emission profiles, with variable posi- 
tion, size, temperature and abundance; these local 
parameters, and the global parameters associated 
with the background emission and the galactic ab- 
sorption, reside in a highly multi-dimensional space 
which we explore with a Markov Chain Monte Carlo 
sampler. 

4. Testing this methodology on simulated data shows 
that the input parameters can be recovered reli- 
ably, with precision limited by the shot noise in the 
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Monte Carlo mock data generation. 

5. SPI has been demonstrated to open up a variety 
of novel analysis pathways: the translation of noisy 
data into uncertain particle parameters brings the 
underlying physical distributions a significant dis- 
tance closer. 

Multiple avenues can be taken in extending this work 
further. Firstly, since we have (iteratively) propagated 
a complex model through the XMM instruments, it is 
straightforward to propagate the same model through 
multiple instrument responses simultaneously. Therefore, 
joint EPIC, RGS, Chandra, and Astro-E analyses, all 
of which would benefit from a careful treatment of the 
spatial-spectral response, may become routine. Secondly, 
we anticipate that simultaneous X-ray, SZ and lensing 
analyses could greatly benefit from a many-parametric ap- 
proach such as this: simpler analyses under-use the in- 
formation in the X-ray data, allowing biases to creep in. 
Thirdly, we expect SPI analysis of low signal-to- noise data 
sets, such as those obtained from X-ray cluster surveys, to 
be very efficient: since the computation time scales with 



the number of photons, analysis of simple data sets would 
be extremely fast. The retention of maximum information 
in this data regime is clearly of much interest. 

Fourthly, the particle positions can also be deprojected 
in similar ways to standard deprojection techniques in or- 
der to model clusters in 3D: in a future paper we will show 
how we do this using SPI, and retaining the full 2D clus- 
ter asymmetries. Fifthly, we anticipate that a variety of 
hydrodynamical constraints could be applied to limit the 
vast parameter space volume that is being considered. To 
date we have used only uninformative priors, as we at- 
tempt to move from data space to particle space; the use 
of physical priors on the gas pressure should aid the recon- 
struction of the cluster plasma distribution. Finally, this 
work can be seen as the beginnings of a bridge to theo- 
retical physical models of cluster formation and evolution; 
one can envisage such models being developed under con- 
straints imposed by the data in something like the method- 
olgy outlined here. Smoothed particles sit very well in the 
hierarchical picture of cluster formation: we have good 
reason to believe that they may be the optimal basis set 
with which to reconstruct their images. 



APPENDIX 

DERIVATION OF LIKELIHOOD FUNCTION 

The basic datum is a number of photons in a three dimensional bin (i^i, U2^p). The bin size may be set by physical limits 
such as the extent of the CCD pixel, or be imposed given some understanding of, for example, the spectral resolution 
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Fig. 9. — 500^^ x 500^^ sky-maps of ICM temperature (keV) after 1, 10, 100 and 1000 iterations centered on the double-cluster centroid. 
The temperature is calculated according to Eq. 1241 and the median over all iterations is taken in each ^" x ^" pixel. The contours show the 
luminosity obtained from the EPIC run after 1000 iterations. Based on data from the EPIC run. 



of the instrument. The simulated photons are then binned on the same grid. Assume that we have n photons in the 
observed dataset, and have simulated m photons from our model: in the i^^ bin we have Ei simulated photons, and Oi 
observed photons. Since the photons falling into a set of bins will follow a multinomial distribution, we can use the usual 
multinomial likelihood function 

^i="'n^^ (Ai) 

i 

where we have used ^ as a proxy for the actual probability of finding a photon in the zth bin. In the limit of large m, 
this will approach the converge to the true probability of the model. 

A more sophisticated likelihood function is one that attempts to include the error induced by the Monte Carlo noise on 
the true probability. We then calculate, L2, which is the product of two likelihood functions for both the detected and 
simulated photons (i.e. a two sample likelihood function) 
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Fig. 10. 600'' x 300'' sky-maps of ICM temperature (keV) after 1, 10, 100, 1000 iterations (top left to bottom right) for the RGS run. 
The contours show the luminosity obtained from the EPIC run after 1000 iterations for comparison. The particles are smeared due to the 
uncertainty on the position in the dispersion direction of the spectrometer. 




Fig. 11. — 500" x 500" sky-map, showing the width of the distribution for the median temperatures from iteration 200 to iteration 1000. 
The map depicts the difference of the 84th and 16th percentile of the distribution of the luminosity weighted temperatures for each iteration 
at any sky-coordinate. 



n\m\ 



n 



\^ m+n J y 



m+n 
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Fig. 13. — 300^^ x 300^^ RGB sky-map of ICM luminosity (left) where each color represents the luminosity of particles in different ICM 
temperature ranges. Here red represents particles with kT = 0-3 keV , green; kT = 3-6 keV and blue; kT = 6-9 keV . In the plot to the right 
the green component has been removed to make the difference of the hot and cold components clearer. 



where we have used as an estimate for the probabihty in the ith bin. 

Another approach would be to not assume that the probabihty, pi is estimated by ^^"^^ , but instead integrate this 
probabihty distribution over a prior distribution for that probabihty. This is expressed as, 

Ls = n\m\ n I ^^fiPi)dP^ (A3) 

where f{pi) is the prior distribution. We have not, however, been able to obtain both an analytic and properly normalized 
expression for L3, in the case where the numbers of simulated and observed photons are constrained to be the same. 
Therefore, we use L2 and note that in the limit of large m these expressions will be identical. This is indeed the case for 
unconstrained photon numbers where the Poisson distribution is appropriate for the likelihood of the counts in each bin. 
Future study, however, may develop these statistics further. 

It is also possible to normalize these likelihood functions, such that they will approximately follow the distribution. 
This is done by dividing by the same expression with the probability estimates replaced by the actual number of counts. 
So the normalized expression for L2 can be written as 
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and for L2 it is 

Note that —2 logL^ and —2 logL2Ar will be distributed as with the number of degrees of freedom equal to the number 
of bins. We use L2N for all future analysis. In the appendix we demonstrate how this statistic approaches the two sample 
statistic that we used in the large m and n limit. 

Following, we demonstrate how the likelihood function, we described in §2.3 approach the two sample distribution. 

Oi+E. 

( CA±^ ) 
-|— p \^ m+n J 

2N 



\ n J \ m J 

Note, if we take the logarithm of the above equation, we obtain 



Rewriting this. 



Oj+Ej Oj+Ej 

\0gL2N = ^O.logn^^ + ^,logm^^. (A7) 



( ^ o.+£. _o.\ 

log L2^ = ^ log f 1 + ^ j (A8) 

+ ^aog l + ^2±^ -\. (A9) 



1 LJ. L — ( ) ■ m i—l L — J-iJ^ 



Expanding these around — ^^^^^^ = ei ^ and — ^^^^^ = 62 ~ 0, we obtain 



log L2N = Y.Oi (ei - I) + i3i (e2 - I ) (AlO) 



Exponentiating this equation gives, 

L2N = Y[e e e"^^-°"e™^^-^- (All) 



\ ^ m-\-n } V m-\-rL j 



The last two factors multiplied together are one, so we obtain. 



n_ V ^ m + TX / _ V ^ m + n, / 

e 20, e iE^ (A12) 



when both m and n are large. For small e\ and 62, this is also equivalent to 



\ * m-\-n ) V ?Ti, + ri / 



and this is exactly equal to 



L2N = e ""2^ . (A14) 

where X2S sample function that was used in lPeterson. Jernigan. fc KahnI (|2QQ4l ). Its reduced value would 

be close to one for a good fit. 

APPENDIX 
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TEST OF PARAMETER UPDATE METHOD 

In order to determine if the interation sequence we used in §2.4.3 is vahd and approximately obeys detailed balance we 
performed several simulations. We use the double isothermal beta model in §3.1 as the input simulation. We then fit this 
same model to the simulation instead of using the smoothed particles. In this way, the model is much simpler and only 
has the following 15 parameters, column density, the two temperatures, core radii, the positions (4 parameters), the 
two abudances, and overall normalizations. With this few numbers of parameters, we can update the parameters in three 
different ways. First, we update the parameters treating all the parameters as global parameters defined in §2.4.3. This is 
a proper Markov chain obeying detailed balance and all parameters are modified simulataneously in a single Markov step. 
Second, we update the parameters for one beta model as global parameters and the other parameters as local parameters. 
Third, we update all the parameters as local parameters. The second two approaches are not standard Markov chains, 
but essentially create small Markov chains at each iteration for some of the parameters. 




Fig. A14. — Posterior distributions of temperature (right) and absorption (left) using three different parameter update schemes; all global 
(dotted), mixed global and local (thin dashed) and all local parameters (thick dashed). 

We display the posterior distributions of the temperature and absorption parameters in Figure IA14[ In all three 
simulations the posterior distribution is identical. This demonstrates that the various iteration scheme approximately 
obey detailed balance. This suggests that it is not critically important what scheme we use and works in the limit 
that the posterior distribution becomes stationary. These simulations also demonstrate that the posterior temperature 
distribution is smaller when the model is constrained to only consist of two temperatures than the other simulations with 
smoothed particles because of the simplicity of the model. The posterior distribution is smallest at low temperatures 
because of the nature of X-ray spectra. 
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scientific and calibration support of the RGS at Stanford. This work was also supported in part by the U.S. Department 
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